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Abstract 

Mobile applications are becoming increasingly ubiquitous 
and provide ever richer functionality on mobile devices. 
At the same time, such devices often enjoy strong connec- 
tivity with more powerful machines ranging from laptops 
and desktops to commercial clouds. This paper presents 
the design and implementation of CloneCloud, a system 
that automatically transforms mobile applications to ben- 
efit from the cloud. The system is a flexible application 
partitioner and execution runtime that enables unmodi- 
fied mobile applications running in an application-level 
virtual machine to seamlessly off-load part of their exe- 
cution from mobile devices onto device clones operating 
in a computational cloud. CloneCloud uses a combination 
of static analysis and dynamic profiling to optimally and 
automatically partition an application so that it migrates, 
executes in the cloud, and re-integrates computation in a 
fine-grained manner that makes efficient use of resources. 
Our evaluation shows that CloneCloud can achieve up to 
21.2x speedup of smartphone applications we tested and 
it allows different partitioning for different inputs and net- 
works. 

1 Introduction 

Mobile cloud computing is the next big thing. In recent re- 
search done by ABI research ||29l , it has predicted that by 
the end of 2014 mobile cloud computing will deliver an- 
nual revenues of 20 billion dollars. Mobile devices as sim- 
ple as phones and as complex as mobile Internet devices 
with Internet access via multiple technologies, camera(s), 
GPS, and other sensors are the current computing wave, 
competing heavily with desktops and laptops for market 
and popularity. The variety of flash-popular applications 
being featured on various on-line application stores like 
those of Apple, Google, Microsoft and others mean that 
mobile users have no shortage of interesting things to do 
with their devices, for a low fee or even free. 

This blossoming of the mobile application market is 
pushing mobile users beyond the usual staples of personal 
information management and music playback. Now mo- 
bile users look up songs by audio samples; play games; 
capture, edit, and upload video; analyze, index, and aggre- 
gate their mobile photo collections; analyze their finances; 
and manage their personal health and wellness. Also, new 
rich media, mobile augmented reality, and data analytics 



Phone 




Phone 



Cloud 



(a) Single- . 
machine I 
computation! 



Process 




Process 


Clone 
VM 


S < 




— B 


OS 


OS 


HW 


Virtual HW 






VMM 








HW 



(b) Distributed computation 



Figure 1: CloneCloud system model. CloneCloud transforms 
a single-machine execution (mobile device computation) into a 
distributed execution (mobile device and cloud computation) au- 
tomatically. 



applications that require heavy computation are emerging. 
Such applications recruit increasing amounts of computa- 
tion, storage, and communications from a still limited sup- 
ply on mobile devices — certainly compared to tethered, 
grid-powered devices like desktops and laptops — and an 
extremely limited supply of energy. As a result, mobile 
applications end up in one of two camps: 1) they are ei- 
ther designed for the lowest common denominator device, 
pushing most functionality at a service provider's site, and 
leaving little computing done at the device as a thin client; 
or 2) they are built monolithically to run on the device, 
taking a long time to execute on low-end devices, even 
when a split client-server design might have been desired. 

Fortunately, such devices often enjoy strong connec- 
tivity, especially in developed areas. What is more, there 
is increasingly broad availability of tethered computing, 
storage, and communications to spare on commercial 
clouds, at nearby wireless hotspots equipped with compu- 
tational resources (e.g., cloudlet ||32| ). or at the user's PC 
and plugged-in laptop. Putting these two trends together, 
we recently made the case for a flexible architecture that 
enables the seamless use of ambient computation to aug- 
ment mobile device applications 1121 . In this paper, we 
take a first step towards realizing this vision, by design- 
ing and implementing the first version of the CloneCloud 
system. 

CloneCloud boosts unmodified mobile applications by 
seamlessly off-loading part of their execution from the 
mobile device onto device clones operating in a com- 



1 



putational clouaQ. It is designed to serve as a platform 
for generic mobile-device processing as a service. Con- 
ceptually, our system automatically transforms a single- 
machine execution (e.g., computation on a smartphone) 
into a distributed execution that is optimal given the net- 
work connection to the cloud, if needed, the relative pro- 
cessing capabilities of the mobile device and cloud, and 
the application's computing patterns (Figure[TJ. 

The underlying motivation for such a system lies in the 
following intuition: as long as execution on the cloud is 
significantly faster than execution on the mobile device 
(or more reliable, more secure, etc.), paying the cost for 
sending the relevant data and code from the device to the 
cloud and back may be worth it. Unlike partitioning a ser- 
vice by design between an undemanding mobile client and 
a computationally expensive server in a provider's infras- 
tructure, CloneCloud late-binds this kind of partitioning. 
Only when the metric (e.g., performance or energy) of the 
newly partitioned application is better than that of the ex- 
isting application, it makes sense to partition an applica- 
tion. In practice, the partitioning decision may be more 
fine-grained than a yes/no answer (i.e., it may result in 
carving off different amounts of the original application 
for cloud execution). Furthermore, the decision may be 
impacted not only by the application itself, but also by the 
expected workload and the execution conditions, such as 
network connectivity and CPU speeds of both mobile and 
cloud devices. A fundamental design goal for CloneCloud 
is to allow such fine-grained flexibility on what to run 
where, which traditional client-server partitionings hard- 
wire early on in the development process. 

Another design goal for CloneCloud is to take the pro- 
grammer out of the business of application partitioning. 
While we conjecture that automatic partitioning is un- 
likely to produce optimized applications that can rival 
what a competent programmer would hand-code, we as- 
sert that competent programmers are also unlikely to will- 
ingly do such a hand-coding job for every possible set of 
circumstances a user may face. The kinds of applications 
on mobile platforms that are featured on application stores 
and gain flash popularity tend to be low-margin products, 
whose developers have little incentive to optimize man- 
ually for different combinations of architectures, network 
conditions, battery lives, and hosting infrastructures. Con- 
sequently, CloneCloud aims to make application partition- 
ing seamless, and based only on the deployed version of 
the application, without need for source code. 

Our work in this paper applies primarily to application- 
layer virtual machines, such as the Java VM, DalvikVM 
from the Android Platform, and Microsoft's .NET. The 
relative ease of manipulating application executables and 
migrating pieces thereof to computing devices of diverg- 
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Throughout this paper, we use the term "cloud" in a broader sense 
to include diverse ambient computational resources discussed above. 



Figure 2: The CloneCloud prototype architecture. 



ing architectures made the AppVM model a promising 
first platform on which to explore our work. We expect 
some — but not all — of our design decisions to carry over 
when addressing such partitioning at lower layers in the 
execution stack, e.g., to UNIX-level processes, to kernel- 
level process containers, or to mobile hypervisors. 

The CloneCloud prototype described here meets all our 
design goals, by rewriting an unmodified application exe- 
cutable. While the modified executable runs, at automat- 
ically chosen points individual threads migrate from the 
mobile device to a device clone in a cloud. There the 
thread executes, possibly accessing native features of the 
hosting platform such as the fast CPU, network, hardware 
accelerators, storage, etc. Eventually, the thread returns 
back to the mobile device, along with any state it cre- 
ated abroad, which it merges back into the original pro- 
cess. The choice of where to migrate off and back onto 
the mobile device is made by a partitioning component, 
which uses static analysis to discover constraints on possi- 
ble migration points, and dynamic profiling to build a cost 
model for execution and migration. A mathematical op- 
timizer chooses migration points that optimize execution 
time given the application and the cost model. Figure [2] 
shows the high-level architecture of our prototype. 

Much research has attacked application partitioning 
and migration in the past (we present detailed related 
work in Section |7). We distill our novel contributions 
here as follows. First, unlike traditional suspend-migrate- 
resume mechanisms BP for application migration, the 
CloneCloud migrator operates at thread granularity, an es- 
sential consideration for mobile applications, which tend 
to have features that must remain at the mobile device, 
such as those accessing the camera or managing the user 
interface. Second, unlike past application-layer VM mi- 
grators 181431 . the CloneCloud migrator allows native sys- 
tem operations to execute both at the mobile device and at 
its clones in the cloud, harnessing not only raw CPU cloud 
power, but also system facilities or specialized hardware. 
Third, unlike mostly programmer-assisted approaches to 
application partitioning, the CloneCloud partitioner auto- 
matically identifies costs and constraints through static 
and dynamic code analysis, without the programmer's 
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Figure 4: Partitioning analysis framework. 



Figure 3: A general architecture for an application-layer 
virtual machine. 

help, annotations, or application refactoring. 

In what follows, we first give some brief background 
on application-layer VMs (Section |2j. We then present 
the design of CloneCloud's partitioning components (Sec- 
tion and its distributed execution mechanism (Sec- 
tion 0J. We describe our implementation (Section [5} and 
experimental evaluation of the prototype (Section |6). We 
survey related work in Section [7] discuss future research 
agenda in Section[8] and conclude in Section|9] 

2 Background: Application VMs 

An application-level VM is an abstract computing ma- 
chine that provides hardware and operating system in- 
dependence (Figure [3). Its instruction sets are platform- 
independent bytecodes; an executable is a blob of byte- 
codes. The VM runtime executes bytecodes of methods 
with threads. There is typically a separation between the 
virtual portion of an execution and the native portion; the 
former is only expressed in terms of objects directly visi- 
ble to the bytecode, while the latter include management 
machinery for the virtual machine itself, data and compu- 
tation invoked on behalf of a virtual computation, as well 
the process-level data of the OS process containing the 
VM. Interfacing between the virtual and the native por- 
tion happens via native interface frameworks. 

Runtime memory is split between VM-wide and per- 
thread areas. The Method Area, which contains the types 
of the executing program and libraries as well as static 
variable contents, and the Heap, which holds all dynam- 
ically allocated data, are VM-wide. Each thread has its 
own Virtual Stack (stack frames of the virtual hardware), 
the Virtual Registers (e.g., the program counter), and the 
Native Stack (containing any native execution frames of a 
thread, if it has invoked native functions). 

Most computation, data structure manipulation, and 
memory management are done within the abstract ma- 
chine. However, external processing such as file I/O, net- 



working, using local hardware such as sensors, are done 
via APIs that punch through the abstract machine into the 
process's system call interface. 

3 Partitioning 

The partitioning mechanism in CloneCloud aims to mod- 
ify an application executable by deciding where to exe- 
cute methods in the code. No special considerations are 
required for the executable beyond targeting the same ap- 
plication VM; that is, it need not be written in a partic- 
ular idiom, e.g., a dataflow language. The output of the 
partitioning mechanism is the executable with partition- 
ing points, optimal for a choice of execution conditions 
(network link characteristics between mobile device and 
cloud, relative CPU speeds). The partitioning mechanism 
can be run multiple times for different execution condi- 
tions, resulting in a database that maps partitioning to con- 
ditions. At runtime, the distributed execution mechanism 
we describe in Section @] implements the choice of parti- 
tion for the current execution conditions. 

Partitioning of an application operates according to the 
conceptual workflow of Figure [4] Our partitioning frame- 
work combines static program analysis with dynamic 
program profiling to produce partitioning that optimizes 
goals while meeting correctness constraints. 

The first component, the Static Analyzer, identifies le- 
gal partition choices for the application executable, ac- 
cording to a set of constraints (Section [3.U . Constraints 
codify the needs of the distributed execution engine used, 
as well as the particular usage model we target; however, 
different mechanisms can seamlessly be plugged into the 
partitioning component by changing these constraints. 

The second component, the Dynamic Profiler (Sec- 
tion l3.2l ). runs the input executable on different platforms 
(the mobile device and on the cloud clone) under a set of 
inputs, and returns a set of profiled executions. Profiled 
executions are used to compose a cost model for the ap- 
plication under different partitionings. 

Finally, the Optimization Solver finds a legal partition- 
ing among those enabled by the static analyzer that mini- 
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class C { 
void a () { 

if W; c0;> 

} 

void b() { 
} // lightweight 
void c() { 
} // expensive 

} 

void main () { 
C c; c.a(); 

} 

(a) program 
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(b) static control- (c) partitioned 
flow graph graph 



Figure 5: An example of a program, its corresponding 
static control-flow graph, and a partitioning 



mizes an objective function, using the cost model derived 
by the profiler (Section [3.31 . The resulting partitioning is 
used to modify the executable, yielding the final output of 
the partitioner. This partitioning is an offline process that 
generates a model that the runtime uses. 

3.1 Static Analyzer 

The partitioner uses static analysis to identify legal 
choices for placing migration and re-integration points in 
the code. In principle, these points could be placed any- 
where in the code, but we reduce the available choices 
to make the optimization problem tractable. In particular, 
we restrict migration and re-integration points to the en- 
try and exit points, respectively, of methods. In addition, 
to focus on our application program, we restrict these par- 
titioning points to methods of application classes as op- 
posed to methods of system classes (e.g., the core classes 
for Java) or native methods. 

Figure [5] shows an example of a program, relevant 
parts of its static control-flow graph, and a particular le- 
gal partitioning of the program. Class C has three meth- 
ods. Method a ( ) calls method b ( ) , which performs 
lightweight processing, followed by method c ( ) , which 
performs expensive processing. The static control-flow 
graph approximates control flow in the program (inferring 
exact control flow is undecidable as program reachabil- 
ity is undecidable). The approximation is conservative in 
that if an execution of the program follows a certain path 
then that path exists in the graph (but the converse typ- 
ically does not hold). In the depicted static control-flow 
graph, only entry and exit nodes of methods are shown, 
labelled as <class name>.<method name>.<entry I exit>; 
other kinds of nodes (e.g. those corresponding to instruc- 
tions) are omitted since we restrict partitioning points to 
method entry and exit. A possible partitioning as shown 
in Figure [5J; runs the body of method c ( ) on the clone, 
and the rest of the program on the mobile device. 



3.1.1 Constraints 

We next describe three properties required by the migra- 
tion component of any legal partitioning and explain how 
we use static analysis to obtain constraints that express 
these properties. 

Property 1. Methods that access specific features of a 
machine must be pinned to the machine. 

If a method uses a local resource such as the location 
service (e.g., GPS) or sensor inputs (e.g., microphones) in 
a mobile device, the method must be executed on the mo- 
bile device. This primarily concerns native methods, but 
also the main method of a program. The analysis marks 
the declaration of such methods with a special annota- 
tion M — for Mobile device. We manually identify such 
methods in the VM's API (e.g., VM API methods explic- 
itly referring to the camera); this is done once for a given 
platform and is not repeated for each application. We also 
always mark the main method of a program. We refer to 
methods marked with M as the Vm method set. 

Property 2. Methods that share native state must be colo- 
cated at the same machine. 

An application may have native methods that create and 
access state below the VM. Native methods may share na- 
tive state. Such methods must be collocated at the same 
machine as our migration component does not migrate na- 
tive state (Section l4~TI ). To avoid a manual-annotation bur- 
den, native state annotations are inferred automatically by 
the following simple approximation, which works well in 
practice: we assign a unique annotation Nate to all na- 
tive methods declared in the same class C; the set Vjv a t c 
contains all methods with that annotation. 

Property 3. Prevent cyclic migration. 

With one phone and one clone, this implies that there 
should be no nested suspends and no nested resumes. 
Once a program is suspended for migration at the en- 
try point of a method, the program should not be sus- 
pended again without a resume, i.e., migration and re- 
integration points must be executed alternately. To en- 
force this property, the static analysis builds the static 
control-flow graph of an application, capturing the caller- 
callee method relation; it exports this as two relations, 
DC (mi , mz), read as "method mi Directly Calls method 
m2," and TC(mi, 1x12) read as "method mi Transitively 
Calls method 7712," which is the transitive closure of DC. 
For the example in Figure [5] this ensures that if partition- 
ing points are placed in a ( ) then they are not placed in 
b ( ) or c ( ) . The other remaining legal parti tionings place 
no migration points at a ( ) but at b ( ) , at c ( ) , or at both 
b ( ) and c ( ) . 
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(a) trace (b) profile tree 



Figure 6: An example of an execution trace (a) and its 
corresponding profile tree (b). Edge costs are not shown. 

3.2 Dynamic Profiler 

The job of the profiler is to collect the data that will be 
used to construct a cost model for the application under 
different execution settings. The cost metric can be differ- 
ent things, including energy expenditure, resource foot- 
print, etc.; we focus on execution time in the prototype 
presented here. 

The profiler is invoked on multiple executions of the 
application, each using a different set of input data (e.g., 
command-line arguments and user-interface events), and 
each executed once on the mobile device and once on the 
clone in the cloud. The profiler outputs a set S of execu- 
tions, and for each execution a profile tree T and T", from 
the mobile device and the clone, respectively. 

A profile tree is a compact representation of an execu- 
tion on a single platform. It is a tree with one node for each 
method invocation in the execution; it is rooted at the start- 
ing (user-defined) method invocation of the application 
(e.g., main). Specific method calls in the execution are 
represented as edges from the node of the caller method 
invocation (parent) to the nodes of the callees (children); 
edge order is not important. Each node is annotated with 
the cost of its particular invocation in the cost metric (ex- 
ecution time in our case). In addition to its called-method 
children, every non-leaf node also has a leaf child called 
its residual node. The residual node i' for node i repre- 
sents the residual cost of invocation i that is not due to the 
calls invoked within i; in other words, node i' represents 
the cost of running the body of code excluding the costs 
of the methods called by it. Finally, each edge is anno- 
tated with the state size at the time of invocation of the 
child node, plus the state size at the end of that invoca- 
tion; this would be the amount of data that the migrator 
(Section [4. U would need to capture and transmit in both 
directions, if the edge were to be a migration point. Edges 
between a node and its residual child have no cost. 

Figure [6] is an example of an execution trace and its 
corresponding profile tree, a is called twice in main, one 
a call invoking b and c, and one a call invoking no other 
method. A tree node on the right holds the execution time 



of the corresponding method in the trace (the length of the 
square bracket on the left), main' and a' are residual 
nodes, and they hold the difference between the value of 
their parent node and the sum of their sibling nodes. For 
example, node main' holds the value t% — t% = (£4 — 

*i) - ((*4 - *a) + (*a - *i)). 

To fill in profile trees, we temporarily instrument 
method entry and exit points during each profile run on 
each platform. We focus only on application code to have 
low profiling overhead; we treat system or library meth- 
ods as inline code executed in the body of the calling ap- 
plication method. For our execution-time cost metric, we 
collect timings at method entry and exit points, which we 
process trivially to fill in tree node annotations. For pro- 
file trees executed at the clone, we leave edge costs set to 
(since those do not initiate migration). For mobile-device 
trees, we perform the suspend-and-capture operation of 
the migrator (Section l4~TT i. measure the state size, and dis- 
card the captured state, both when invoking the child node 
and when returning from it. Recall that for every execu- 
tion E, we capture two profile trees, one per platform with 
different annotations. 

For each invocation i in profiling execution E, we 
define a computation cost C c (i,l) and a migration cost 
C s (i), where I is the location of the invocation. We fill 
in C c (i, I) from the corresponding profile tree collected at 
location I; if i is a leaf profile tree node, we set C c (i, I) to 
be the annotation of that node; otherwise, we set it to the 
annotation of the residual node i'. We fill C s (i) as the cost 
of making invocation i a migrant invocation. This cost is 
the sum of a suspend/resume cost and a transfer cost. The 
former is the time required to suspend a thread and resume 
a thread. The latter is a volume-dependent cost, the time it 
takes to capture, serialize, transmit, deserialize, and rein- 
stantiate state of a particular size (assuming for simplicity 
all objects have the same such cost per byte). We precom- 
pute this per-byte cosjl, and use the edge annotations from 
the mobile-device profile tree to calculate the cost. 

3.3 Optimization Solver 

The purpose of our optimizer is to pick which application 
methods to migrate to the clone from the mobile device, 
so as to minimize the expected cost of the partitioned ap- 
plication. Given a particular execution E and its two pro- 
file trees T on the mobile device and T 1 on the clone, one 
might intuitively picture this task as optimally replacing 
annotations in T with those in T', so as to minimize the 
total node and weight cost of the hybrid profile tree. Our 
static analysis dictates the legal ways to fetch annotations 
from T" into T, and our dynamic profiling dictates the 
actual trees T and T'. We do not differentiate among dif- 

2 One could also estimate this per-byte cost from memory, processor, 
and storage speeds, as well as network latency and bandwidth, but we 
took the simpler approach of just measuring it. 
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ferent executions E in the execution set S; we consider 
them all equiprobable, although one might assign non- 
uniform frequencies in practice to match a particular ex- 
pected workload. 

More specifically, the output of our optimizer is a value 
assignment to binary decision variables R(m), where m is 
every method in the application. If the optimizer chooses 
R(m) = 1 then the partitioner will place a migration 
point at the entry into the method, and a re-integration 
point at the exit from the method. If the optimizer chooses 
R(m) = 0, method m is unmodified in the application 
binary. For simplicity and to constrain the optimization 
problem, our migration strategy chooses to migrate or not 
migrate all invocations of a method. Despite its simplic- 
ity, this conservative strategy provides us with undeniable 
benefits (Section|6]i; we leave further refining differentia- 
tions depending on calling stack, method arguments, etc., 
to future work. 

Not all partitioning choices for R(.) are legal (Sec- 
tion [3TTTTJ. To express these constraints in the optimiza- 
tion problem, we define an auxiliary decision variable 
L(m) indicating the location of every method m, and 
three relations /, as well as DC and TC computed dur- 
ing static analysis. I(i, to) is read as "i is an invocation of 
method to," and is trivially defined from the profile runs. 
Whereas DC and TC are computed once for each appli- 
cation, / is updated with new invocations only when the 
set S of profiling executions changes. 

Using the decision variables R(.), the auxiliary deci- 
sion variables L(.), the method sets Vm and V^atc f° r a U 
classes C defined during static analysis, and the relations 
I, DC and TC from above, we formulate the optimiza- 
tion constraints as follows: 

L(mi) ^ L(to2), Vtoi,TO2 : DC (mi, 1712) = 1 

Ai?(TO 2 ) = 1 (1) 

L(m)=0, Vm e V M (2) 

L(mi) = L(rri2), Vmi,ma, C : mi, ma G VW C 0) 
R(m2) = 0, Vtoi, m,2 : TC(mi, ma) = 1 

A.R(mi) = 1 (4) 

The first is a soundness constraint. Constraint Q] requires 
that if a method causes migration to happen, it cannot 
be collocated with its callers. The remaining three corre- 
spond to the three properties defined in the static analysis. 
Constraint[2]requires that all methods pinned at the mobile 
device run on the mobile device (Property 1). Constraint^ 
requires that methods dependent on the native state of the 
same class C are collocated, at either location (Property 
2). And constraint |4]requires that all methods transitively 
called by a migrated method cannot be themselves mi- 
grated (Property 3). 

The cost of a (legal) partitioning R(.) of execution E 
is defined as follows, in terms of the auxiliary variables 



L(.), the relation / and the cost variables C c and C s from 
the dynamic profiler: 

C(E) = 

Comp(E) = 



Migr(E) 



Comp(E) + Migr(E) 
[(l-L(m))I(i,m)C c (i,0) 

+L(m)I(i,m)C c (i,l)\ 
R(m)I(i,m)C s (i) 



Comp(E) is the computation cost of the partitioned exe- 
cution E and Migr(E) is its migration cost. For every in- 
vocation i 6 E, the computation cost takes its value from 
the mobile-device tree annotation C c (i, 0), if the method 
to being invoked is to run on the mobile device, or from 
the clone tree annotation C c (i, 1) otherwise. The migra- 
tion cost sums the individual migration costs of only those 
invocations whose methods are migration points. 

Finally, the optimization objective is to choose R() so 
as to minimize J2esS C(E). We use a standard integer 
linear programming (ILP) solver to solve this optimiza- 
tion problem with the above constraints. 

4 Distributed Execution 

The purpose of the distributed execution mechanism in 
CloneCloud is to implement a specific partitioning of an 
application process running inside an application-layer 
virtual machine, as determined during partitioning (Sec- 
tionE). 

The lifecycle of a partitioned application is as fol- 
lows. When the user attempts to launch a partitioned 
application, current execution conditions (availability of 
cloud resources and network link characteristics between 
the mobile device and the cloud) are looked up in a 
database of pre-computed partitions. The lookup result 
is a binary, modified with particular migration and re- 
integration points (special VM instructions in our pro- 
totype), which is then launched in a new process. When 
execution of the process on the mobile device reaches a 
migration point, the executing thread is suspended and 
its state (including virtual state, program counter, regis- 
ters, and stack) is packaged and shipped to a synchronized 
clone. There, the thread state is instantiated into a new 
thread with the same stack and reachable heap objects, 
and then resumed. When the migrated thread reaches a re- 
integration point, it is similarly suspended and packaged 
as before, and then shipped back to the mobile device. 
Finally, the returned packaged thread is merged into the 
state of the original process. When conditions change, or 
upon explicit user input via a simple configuration inter- 
face, a different partition and corresponding binary can be 
substituted for subsequent invocations of the application. 

CloneCloud migration operates at the granularity of a 
thread. This allows a multi-threaded process to off-load 
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functionality, one thread-at-a-time. CloneCloud enables 
threads, local and migrated, to use — but not migrate — 
native, non-virtualized features of the platform on which 
they operate: this includes the network, unvirtualized 
hardware accelerators, natively implemented API func- 
tionality (such as expensive-to-virtualize image process- 
ing routines), etc. In contrast, most prior work providing 
application-layer virtual-machine migration keeps native 
features and functionality exclusively on the original plat- 
form, only permitting the off-loading of pure, virtualized 
computation. 

These two unique features of CloneCloud, thread- 
granularity migration and native-everywhere operation, 
enable new execution models. For example, a mobile ap- 
plication can retain its user interface threads running and 
interacting with the user, while off-loading worker threads 
to the cloud if this is beneficial. This would have been im- 
possible with monolithic process or VM suspend-resume 
migration, since the user would have to migrate to the 
cloud along with the code. Similarly, a mobile application 
can migrate a thread that performs heavy 3D rendering op- 
erations to a clone with GPUs, without having to modify 
the original application source; this would have been im- 
possible to do seamlessly if only migration of virtualized 
computation were allowed. 

CloneCloud migration is effected via three distinct 
components: (a) a per-process migrator thread that as- 
sists a process with the mechanics of suspending, pack- 
aging, resuming, and merging thread state, (b) a per-node 
node manager that handles node-to-node communication 
of packaged threads, clone image synchronization and 
provisioning; and (c) a simple partition database that de- 
termines what partitioning to use. 

The migrator functionality manipulates internal state 
of the application-layer virtual machine; consequently we 
chose to place it within the same address space as the 
VM, simplifying the procedure significantly. A manager, 
in contrast, makes more sense as a per-node component 
shared by multiple applications, for several reasons. First, 
it enables application-unspecific node maintenance, in- 
cluding file-system synchronization between the device 
and the cloud. Second, it amortizes the cost of commu- 
nicating with the cloud over a single, possibly authenti- 
cated and encrypted, transport channel. Finally, it paves 
the way for future optimizations such as chunk-based 
or similarity-enhanced data transfer 12611371 . Our current 
prototype has a simple configuration interface that allows 
the user to manually pick out a partition from the database, 
and to choose new configurations to partition for. We next 
delve more deeply into the design of the distributed exe- 
cution facilities in CloneCloud. 




Mobile Phone Clone 

Figure 7: Migration overview. 

4.1 Suspend and Capture 

Upon reaching a migration point, the job of the thread mi- 
grator is to suspend a migrant thread, collect all of its state, 
and pass that state to the node manager for data transfer. 
The thread migrator is a native thread, operating within 
the same address space as the migrant thread, but outside 
the virtual machine. As such, the migrator has the abil- 
ity to view and manipulate both native process state and 
virtualized state. 

To capture thread state, the migrator must collect sev- 
eral distinct data sets: execution stack frames and rele- 
vant data objects in the process heap, and register con- 
tents at the migration point. Virtualized stack frames — 
each containing register contents and local object types 
and contents — are readily accessible, since they are main- 
tained by the VM management software. Starting with lo- 
cal data objects in the collected stack frames, the migra- 
tor recursively follows references to identify all relevant 
heap objects, in a manner similar to any mark-and-sweep 
garbage collector. For each relevant heap object, the mi- 
grator stores its field values, and collects all relevant static 
fields as well (e.g., static class fields). 

Captured state must be conditioned for transfer to be 
portable. First, object field values are stored in network 
byte order to allow for incompatibilities between differ- 
ent processor architectures. Second, whereas typically a 
stack frame contains a local native pointer to the particular 
class method it executes (which is not portable across ad- 
dress spaces or processor architectures), we store instead 
the class name and method name, which are portable. 

4.2 Resume and Merge 

As soon as the captured thread state is transferred to the 
target clone device, the node manager passes that state to 
the migrator of a newly allocated process. To resume that 
migrant thread, the migrator must overlay the thread con- 
text over the clean process address space. This overlaying 
process is essentially the inverse of the capture process 
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described in Section I47T1 The executable text is loaded (it 
can be found under the same filename in the synchronized 
file system of the clone). Then all captured classes and ob- 
ject instances are allocated in the virtual machine's heap, 
updating static and instance field contents with those from 
the captured context. As soon as the address space con- 
tains all the data relevant to the migrant thread, the thread 
itself is created, given the stack frames from the capture, 
the register contents are filled to match the state of the 
original thread at the migration point in the mobile device, 
and the thread is marked as runnable to resume execution. 

As described above, the cloned thread will eventually 
reach a reintegration point in its executable, signaling that 
it should migrate back to the mobile device. Reintegra- 
tion is almost identical conceptually to the original mi- 
gration: the clone's migrator captures and packages the 
thread state, the node manager transfers the capture back 
to the mobile device, and the migrator in the original pro- 
cess is given the capture for resumption. There is, how- 
ever, a subtle difference in this reverse migration direc- 
tion. Whereas in the forward direction — from mobile de- 
vice to clone — a captured thread context is used to create 
a new thread from scratch, in the reverse direction — from 
clone to mobile device — the context must update the orig- 
inal thread state to match the changes effected at the clone. 
We call this process a state merge. 

A successful design for merging states in such a fash- 
ion depends on our ability to map objects at the original 
address space to the objects they "became" at the cloned 
address space; object references themselves are not suffi- 
cient in that respect, since in most application-layer VMs, 
references are implemented as native memory addresses, 
which look different in different processes, across dif- 
ferent devices and possibly architectures, and tend to be 
reused over time for different objects. 

Our solution is an object mapping table, which is only 
used during state capture and reinstantiation in either di- 
rection, and only stored while a thread is executing at a 
clone. We instrument the VM to assign a per-VM unique 
object ID to each data object created within the VM, us- 
ing a local monotonically increasing counter. For clarity, 
we call the ID at the mobile device MID and at the clone 
CID. Once migration is initiated at the mobile device, a 
mapping table is first created for captured objects, filling 
for each the MID but leaving the CID null; this indicates 
that the object has no clone counterpart yet. After instan- 
tiation at the clone, the clone recreates all the objects with 
null CIDs, assigning valid fresh CIDs to them, and re- 
members the local object address corresponding to each 
mapping entry. At this point, all migrated objects have 
valid mappings. 

During migration in the reverse direction, objects that 
came from the original thread are captured and keep their 
valid mapping. Newly created objects at the clone have 
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Figure 8: Object mapping example. 



the locally assigned ID placed in their CID, but get a 
null MID. Objects from the original thread that may have 
been deleted at the clone are ignored and no mapping is 
sent back for them. During the merge back at the mo- 
bile device, we know which objects should be freshly cre- 
ated (those with null MIDs) and which objects should be 
overwritten with the contents fetched back from the clone 
(those with non-null MIDs). "Orphaned" objects that were 
migrated out but died at the clone become disconnected 
from the thread object roots and are garbage-collected 
subsequently. Note that the mapping table is constructed 
and used only during capture and reintegration, not during 
normal memory operations either at the mobile device or 
at the clone. 

Figure[8] shows an example scenario demonstrating the 
use of object mapping. During initial migration, objects 
at addresses 0x01, 0x02, and 0x03 are captured. The mi- 
grator creates the mapping table with three entries, one 
for each object, with the local ID of each object — 1, 2, 
and 3, respectively — in MID, and null CIDs. At the clone, 
the mapping table is stored, updating each entry with the 
local address of each object (0x21, 0x22, and 0x23, re- 
spectively). When the thread is about to return back to 
the mobile device, new entries are created in the table for 
captured objects whose IDs are not already in the CID 
column (objects with IDs 14 and 15). Entries in the ta- 
ble whose CID does not appear in captured objects are 
deleted (the second entry in the figure). Remaining entries 
belong to objects that came from the original thread and 
are also going back (those with CID 1 1 and 13). Note that 
memory address 0x22 was reused at the clone after the 
original object was destroyed, but the object has a differ- 
ent ID from the original object, allowing the migrator to 
differentiate between the two. Back at the mobile device, 
new objects are created for entries with null MIDs (bottom 
two entries), objects with non-null MIDs are updated with 
the returned state (first and third entries), and one object 
(with local address 0x02) is left to be garbage-collected. 



4.3 Optimization 

The VM offers a unique opportunity for optimizing the 
amount of information transfered during migration. Be- 
cause new processes are forked as copies of a "template" 
process — the Zygote, in the Android nomenclature — and 
because that template exists in all booted instances of 
the Android platform, we can avoid transmitting all sys- 
tem heap objects that have not changed since an applica- 
tion was copied from Zygote. This typically saves about 
40,000 object transmissions with every migration opera- 
tion, a significant time and bandwidth overhead reduction. 
Furthermore, even ignoring the transmission cost, some of 
those objects are static or platform-dependent system ob- 
jects, so should not be migrated anyway. 

We obviate migration for system objects in a manner 
similar to how we map objects to platform-independent 
IDs (in Section |4~2| i, with one major difference: whereas 
application processes are first created at the mobile de- 
vice under our control, and then partially copied out and 
back in again as differences from that original single copy, 
Zygote processes are created independently at the mobile 
device and the clone. This creates the challenge of map- 
ping objects from two independent instances of Zygote on 
possibly different platforms. To address the challenge, we 
name each Zygote object according to its class name and 
invocation sequence among all objects of that class — this 
assumes that objects from each class are constructed in the 
same order at Zygote processes on different platforms, an 
assumption that holds true in all Zygote instances we have 
seen so far. 

5 Implementation 

We implemented our prototype of CloneCloud partition- 
ing and migration on the "cupcake" branch of the Android 
OS [20 We tested our system on the Android Dev Phone 
1 IT) (an unlocked HTC Gl device) equipped with both 
WiFi and 3G connections, and on clones running within 
the Android x86 virtual machine. We ported an ARM- 
based Android virtual machine to x86 for this purpose 0. 
Clones execute on a Dell Desktop with a 2.83GHz CPU 
and 4GB RAM, running Ubuntu 8.04. We modified the 
Dalvik VM (Android's application-level, register-based 
VM, principally targeted by a Java compiler front-end) 
for dynamic profiling and migration. These modifications 
comprised approximately 8,000 lines of C code. We also 
implemented static analysis, bytecode rewriting, and the 
CloneCloud node manager in Java. 

For partitioning, we perform all static analysis and 
bytecode rewriting with Java bytecode and convert Java 

3 We expect our design works well with other application-level VMs 
(e.g., JavaME) given the similarity of Dalvik VM and JavaME VM; test- 
ing on other platforms is future work. 

4 Several x86-based Android VMs have appeared since. 



bytecode into Dalvik bytecode. We implemented our 
static analysis in jchord [5] and modified jchord to support 
root methods of analysis that are different from main. We 
modified Dalvik VM tracing to trace migration cost and 
to trace only application methods in which we are inter- 
ested. The profiling is done both on the phone and on the 
clone. Then, we use Mosek ||6) to solve the ILP program 
we defined to produce a partition for each chosen execu- 
tion environment. We use Javassist J4] to rewrite bytecode 
to insert suspend and resume points, which are enabled or 
disabled at run time depending on policies. 

For migration, we modified the Dalvik VM interpreter. 
For the suspend mechanism, we use Dalvik VM's imple- 
mentation of thread suspension. Each VM thread has a 
suspend counter which indicates if there is any pending 
suspend request. It checks this counter whenever it fin- 
ishes the execution of a bytecode instruction, so that we 
can suspend the thread at a safe point. Even if a thread 
was executing a native frame, it also checks the counter 
when it finishes. The calling (migrator) thread waits until 
all other threads are suspended with a condition variable, 
and continues its execution. 

We use hprof (3) as a basis for capturing and repre- 
senting the execution state. It provides a well-defined for- 
mat for storing all the classes and heap objects efficiently. 
Also, since it traverses all the objects and thread stacks 
to collect information, we extend this format to store 
the thread stacks and class file paths. Also, we add the 
CID and MID to each object data for the mapping table. 
We implemented the object mapping table as a separate 
hashtable inside Dalvik VM. With our hashtable imple- 
mentation, the hashtable is created only when migration 
is actually started, and destroyed after the migration. To 
track the object creation and destruction, we modified cor- 
responding functions in Dalvik VM. 

Migration is currently initiated and terminated by the 
(modified) application. To pass control from the applica- 
tion to the migrator thread, we define two CloneCloud 
APIs - ccStart ( ) indicates the start point of the mi- 
gration, and ccStop ( ) defines the end point of the mi- 
gration. In partitioning, we insert these function calls to 
the original application bytecode. The application thread 
calling these operations notifies the migrator thread inside 
Dalvik, and suspends itself. Once the migrator thread gets 
the notification and gains control, it checks with the policy 
engine if the decision is to migrate or not. If the decision 
is yes, it handles the rest of the migration. 

6 Evaluation 

For the evaluation of our prototype, we implemented three 
mobile applications. We evaluated running those applica- 
tions either on an Android Dev Phone 1 — representing the 
status quo, monolithic execution — or by optimially parti- 
tioning them for two execution settings: one with WiFi 
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Table 1: Execution times of virus scanning, image search, and behavior profiling applications. For each application 
we show three rows, one per input size — each application measures input size differently. For each input size, the 
data shown include (from left to right) execution time at the phone alone ("monolithic" execution), execution time 
at the clone alone, CloneCloud execution time, partitioning choice, and speedup (for 3G connectivity), and the same 
information for WiFi connectivity. 



connectivity and one with 3G connectivity. 

The applications we consider are a virus scanner, im- 
age search, and privacy-preserving targeted advertising; 
we briefly describe each next. The virus scanner scans 
the contents of the phone file system against a library of 
1000 virus signatures, one file at a time. We vary the to- 
tal size of the file system between 100KB and 10 MB. 
The image search application finds all faces in images 
stored in the phone file system. We use a face-detection 
Android library that returns the mid-point between the 
eyes, the distance between the eyes, and the pose of every 
face detected. We only use images smaller than 100KB 
each, due to memory limitations of the Android face- 
detection library. We vary the number of images from 1 to 
100. The privacy -preserving targeted advertising applica- 
tion uses behavioral tracking across websites to infer the 
users' preferences, and selects ads according to a result- 
ing model; by doing this tracking at the user's device, pri- 
vacy can be protected (see Adnostic f38l ). We implement 
Adnostic's web page categorization on the mobile device, 
which maps a user's keywords to one of the hierarchical 
interest categories — down to nesting levels 3-5 — from the 
DMOZ open directory JT). The application computes the 
cosine similarity between user interest keywords and pre- 
defined category keywords. 

Table[T]collects all our results for the three applications, 
under three different workload sizes each. The third col- 
umn shows the execution time for each experiment when 
running on the phone monolithically. As a point of com- 
parison, the fourth column shows execution time when the 
application runs on the clone in its entirety. CloneCloud 
cannot achieve this performance, since in practice some 
part of the application must run on the phone, and there 
is non-trivial overhead in migrating the remainder to the 
clone. However the comparison of these two columns, 



as shown in the maximum speedup column coming next, 
captures the speedup opportunity available due to the dis- 
parity between phone and cloud computation resources, 
when offloading computation to a single clone. 

We now turn to the choices CloneCloud makes when 
executing each application using the 3G network or the 
WiFi network. The execution times reported are the aver- 
age of five runs. In the 3G case, communication is per- 
formed via an SSH tunnel between the phone and the 
clone, to punch through our lab firewall. Our 3G connec- 
tion averaged latency of 415 ms, download bandwidth of 
0.91 Mbps, and upload bandwidth of 0.16 Mbps, while 
our WiFi connection had a latency of 66 ms, download 
bandwidth of 7.29 Mbps, and upload bandwidth of 3.06 
Mbps0 

An obvious difference between the two execution envi- 
ronments is that CloneCloud chooses to keep local more 
workloads (5 out of 9) in the 3G case, than in the WiFi 
case (2 out of 9). This can be explained given the over- 
head differences between the two networks. Migration 
costs about 10-15 seconds in the WiFi case, but shoots 
up to 60 seconds in the 3G case, due to the greater latency 
and lower bandwidth in the latter case. In both cases, mi- 
gration costs include a network-unspecific thread-merge 
cost — patching up references in the running address space 
from the migrated thread — and the network-specific trans- 
mission of the thread state. The former dominates the lat- 
ter for WiFi, but is dominated by the latter for 3G. A 
secondary effect in the results is that larger workloads 
benefit from off-loading more: this is due to amortiza- 
tion of the migration cost over a larger computation at 
the clone that receives a significant speedup. Nevertheless, 



5 We used ping to report the average latency from the phone to our lab 
firewall, and we used Xtremelabs Speedtest, downloaded from Android 
market, to measure download and upload bandwidth. 
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the WiFi case displays significant speed-ups in all applica- 
tions: 14x, 21x, and 12x for the largest workload of each 
of the three applications, for a completely automatic mod- 
ification of the application binary without programmer in- 
put. We expect these benefits to increase with a number of 
optimizations targeting the network overheads (in particu- 
lar, 3G network overheads): redundant transmission elim- 
ination and compression. 

Next, we analyze the time to run the partitioning frame- 
work. First, we report the time to perform partitioning 
analysis for the image search application. In our evalu- 
ation, we report the average of five runs. We profile 35 
methods in the application program. Note that we do pro- 
filing of only methods appeared in the application; thus 
profiling is done with low overhead. We profile the ap- 
plication on the phone and on the clone. Profiling execu- 
tion time takes 29.4 seconds on the phone and 1.2 seconds 
on the clone. Profiling migration cost takes 98.4 seconds 
on the phone. Then, running static analysis using jchord 
takes 19.4 seconds with sun jdk 1.5.0_16 on the desktop 
machine. Generating an optimizer (ILP) script from the 
profile trees and constraints and solving the generated ILP 
take less than one second. 

7 Related Work 

CloneCloud is built upon previous research work done in 
automatic partitioning, migration, and remote execution, 
and it combines these technologies in a non-trivial way. 
First, it uses a partitioning framework that combines static 
program analysis with dynamic program profiling. It does 
partitioning in a method level, allows placing methods that 
access native state remotely if they meet partitioning con- 
straints generated by the partitioning framework, and uses 
partitioning that optimizes certain metrics. CloneCloud 
performs migrating specific threads with relevant execu- 
tion state including relevant reachable heap objects. It per- 
forms migration on demand if doing so is beneficial, and 
can merge migrated state back to the original process. 

Partitioning We first summarize previous work on par- 
titioning of distributed systems. Coign 12TI automatically 
partitions a distributed application composed of Microsoft 
COM components to reduce communication cost of parti- 
tioned components. The application must be structured to 
use COM components and partitioning points are COM 
boundaries, and the work focuses on static partitioning 
and assumes that a COM component can be placed any- 
where. Wishbone 0271 and Pleiades [23 ] compile a central 
program into multiple code pieces with stubs for com- 
munication mostly for sensor networks. Wishbone ll27l 
is a system that takes an acyclic dataflow graph of oper- 
ators written in a high-level stream-processing language 
and partitions the dataflow graph between server and a 
set of embedded nodes for sensor computing applications. 



It uses a compiler that generates partitioned source code 
with communication stubs based on profiling CPU and 
network bandwidth consumption. Pleiades |23l compiles 
a central program written in an extended C language with 
the model of accessing the entire network into multiple 
units to run on sensor nodes. MAUI lfl4l partitions ap- 
plications using dynamic profiling and optimization, fo- 
cusing on energy consumption. For offloaded execution, 
it performs method shipping with relevant heap objects. 
J-Orchestra H361 creates partitioned applications automat- 
ically by a compiler that classifies anchored unmodifiable, 
anchored modifiable, or mobile classes. After the analysis, 
it rewrites all references into indirect references (i.e., ref- 
erences to proxy objects) for a cluster of machines, and 
places classes with location constraints (e.g., ones with 
native state constraints) to proper locations. Finally, for 
distributed execution of partitioned applications, it relies 
on the RMI middleware. 

There are also Java program partitioning systems for 
mobile devices whose limitation is that only Java classes 
without native state can be placed remotely I20, [25ll28l . 
The general approach is to partition Java classes into 
groups using adapted MINCUT heuristic algorithms to 
minimize the component interactions between partitions. 
Also, different proposals consider different additional ob- 
jectives such as memory, CPU, or bandwidth. This pre- 
vious work does not consider partitioning constraints like 
our work does, the granularity of partitioning is coarse 
since it is a class level, and it focuses on static partition- 
ing. 

On a related front, Links [H, Hops [33), and UML- 
based Hilda |40| aim to statically partition a client-server 
program written in a high-level functional language or 
a high-level declarative language into two or three tiers. 
Yang et. al [ 391] examine partitioning of programs writ- 
ten in Hilda based on cost functions for optimizing user 
response time. Swift fTTl statically partitions a program 
written in the Jif programming language into client-side 
and server-side computation. Its focus is to achieve con- 
fidentiality and integrity of the partitioned program with 
the help of security labels in the program annotated by 
programmers. 

Migration There has been previous work on support- 
ing migration in Java. MERPATI 1351 provides JVM mi- 
gration using checkpointing the entire heap and all the 
threads with their execution environment (the call stack, 
the local variables, and the operand stacks) and resum- 
ing from a checkpoint. In addition, there has been dif- 
ferent approaches on distributed Java virtual machines 
(DJVMs). They assume a cluster environment where ho- 
mogeneous machines are connected via fast interconnect, 
and try to provide a single system image to users. One ap- 
proach is to build a DJVM upon a cluster enabled infras- 
tructure below the JVM. Jessica |24) and Java/DSM E3 
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rely on page-based distributed shared memory (DSM) 
systems to solve distributed memory consistency prob- 
lems. To address the overhead induced by false sharing 
in page-based DSM systems, Jessica2 ||43l propose an 
object-based solution. cJVM (8) implements a DJVM by 
modifying JVM to support method shipping to remote 
objects with proxy objects, creating threads remotely, 
and supporting distributed stacks. Object migration sys- 
tems such as Emerald [22 1 move objects to the sites run- 
ning threads requesting to access the objects. In con- 
trast, CloneCloud migration chooses partial threads to of- 
fload, moves only their relevant, sufficient execution state 
(thread stack and relevant reachable heap objects), and 
supports merging between existing state and migrated ex- 
ecution state. 

Remote execution Remote execution of resource- 
intensive applications for resource-poor hardware is a 
well-known approach in mobile/pervasive computing. 
All remote execution work carefully designs and pre- 
partitions applications between local and remote exe- 
cution. Typical remote execution systems run a simple 
visual, audio output routine at the mobile device and 
computation-intensive jobs at a remote server ll9l [T5l[T6l 
USEUlllTJ. Rudenko et al. El and Flinn and Satya- 
narayanan Ifl6l explore saving power via remote execu- 
tion. Cyber foraging |9][T0] uses surrogates (untrusted 
and unmanaged public machines) opportunistically to im- 
prove the performance of mobile devices. For example, 
both data staging ifTTl and Slingshot ll34l use surrogates. 
In particular, Slingshot creates a secondary replica of a 
home server at nearby surrogates. ISR 1311 provides the 
ability to suspend on one machine and resume on another 
machine by storing virtual machine (e.g., Xen) images in 
a distributed storage system. 

Finally, our work takes a step towards achieving the vi- 
sion presented in an earlier workshop paper |fT2l , where 
we made the case for augmented smartphone execution 
through clones running in the cloud. In this paper, we have 
presented the concrete design, implementation, and eval- 
uation of our prototype system for such execution. 

8 Discussion and Future Work 

CloneCloud is limited in some respects by its inability to 
migrate native state and to export unique native resources 
remotely. Conceptually, if one were to migrate at a point 
in the execution in which a thread is executing native 
code, or has native heap state, the migrator would have to 
collect such native context for transfer as well. However, 
the complexity of capturing such information in a portable 
fashion (and the complexity of integrating such captures 
after migration) is significantly higher, given processor ar- 
chitecture differences, differences in file descriptors, etc. 
As a result, CloneCloud focuses on migrating at execution 



points where no native state (in the stack or the heap) need 
be collected and migrated. 

A related limitation is that CloneCloud does not vir- 
tualize access to native resources that are not virtualized 
already and are not available on the clone. For example, 
if a method accesses a camera/GPS on the mobile de- 
vice, CloneCloud requires that method to remain pinned 
on the mobile device. In contrast, networking hardware or 
an unvirtualized OS facility (e.g., Android's image pro- 
cessing API) are available on both the mobile device and 
the clone, so a method that needs to access them need 
not be pinned. An alternative design would have been to 
permit migration of such methods, but enable access to 
the unique native resource via some RPC-like mechanism. 
We consider this alternative a complementary point in the 
design space, and plan to pursue it in conjunction with 
thread-granularity migration in the future. 

The system presented in this paper allows only perfunc- 
tory concurrency between the unmigrated threads and the 
migrated thread; pre-existing state on the mobile device 
remains unmodifiable until the migrant thread returns. As 
long as local threads only read existing objects and mod- 
ify only newly created objects, they can operate in tandem 
with the clone. Otherwise, they have to block. A promis- 
ing direction, whose benefits may or may not be borne 
out by the associated complexity, lies in extending this ar- 
chitecture to support full concurrency between the mobile 
device and clones. To achieve this, we need to add thread 
synchronization, heap object synchronization, on-demand 
object paging to access remote objects, etc. 

While in this paper we assume that the environment in 
which we run clone VMs is trusted, the future of roam- 
ing devices that use clouds where they find them demands 
a more careful approach. For instance, many have envi- 
sioned a future in which public infrastructure machines 
such as public kiosks Ifl9l and digital signs are widely 
available for running opportunistically off-loaded compu- 
tations. We plan to extend our basic system to check that 
the execution done in the remote machine is trusted. Auto- 
matically refactoring computation around trusted features 
on the clone is an interesting research question. 

In our related position paper Q~2), we discussed a rich 
design space for automatic off-loading. Our work here 
covers some aspects of primary and background augmen- 
tation, and we would like to continue to explore hard- 
ware augmentation and multiplicity augmentation that 
uses multiple copies of the system image executed in dif- 
ferent ways. 

9 Conclusion 

This paper takes a step towards seamlessly interfacing be- 
tween the mobile and the cloud in the context of mobile 
cloud computing. Our system overcomes design and im- 
plementation challenges to achieve basic augmented ex- 
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ecution of mobile applications on the cloud, represent- 
ing the whole-sale transfer of control from the device to 
the clone and back. We combine partitioning, migration 
with merging, and on-demand instantiation of partition- 
ing to address these challenges. Our prototype delivers up 
to 21.2x speedup for applications we tested, without pro- 
grammer involvement, demonstrating feasibility for the 
approach, and opening up a path for a rich research agenda 
in hybrid mobile-cloud systems. 
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